Learning Head-modifier Pairs to Improve Lexicalized Dependency Parsing on a Chinese Treebank

نویسندگان

Kun Yu

Daisuke Kawahara

Sadao Kurohashi

چکیده

Due to the data sparseness problem, the lexical information from a treebank for a lexicalized parser could be insufficient. This paper proposes an approach to learn head-modifier pairs from a raw corpus, and to integrate them into a lexicalized dependency parser to parse a Chinese Treebank. Experimental results show that this approach not only enlarged the coverage of bi-lexical dependency, but also improved the accuracy of dependency parsing significantly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cascaded Classification for High Quality Head-modifier Pair Selection

This paper presents a cascaded classification approach for selecting head-modifier pairs with high quality from syntactically analyzed sentences. Experimental results show that the proposed approach achieved 76.11% on Fscore of selected head-modifier pairs, which was 8.54% higher than the baseline approach that using sentence length as selection criteria. In addition, compared with using the he...

متن کامل

Chapter 1: Lexicalized PCFG: Parsing Czech

Recent work in statistical parsing of English has used lexicalized trees as a representation, and has exploited parameterizations that lead to probabilities directly associated with dependencies between pairs of words in the tree structure. Parsed corpora such as the Penn treebank have generally been sets of sentence/tree pairs: typically, hand-coded rules are used to assign head-words to each ...

متن کامل

Bootstrapping Lexicalized Models in Memory-Based Dependency Parsing

Previous research has shown that a lexicalized parsing model incorporating words but no parts-of-speech can outperform a model involving partsof-speech but no words given enough training data for supervised learning. We show that the same effect can be achieved with a bootstrapping approach, where a mixed model trained on a small treebank is used to parse a larger corpus which is used as traini...

متن کامل

Lexicalized Beam Thresholding Parsing with Prior and Boundary Estimates

We use prior and boundary estimates as the approximation of outside probability and establish our beam thresholding strategies based on these estimates. Lexical items, e.g. head word and head tag, are also incorporated to lexicalized prior and boundary estimates. Experiments on the Penn Chinese Treebank show that beam thresholding with lexicalized prior works much better than that with unlexica...

متن کامل

Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank

In this paper, we describe our hybrid parsing model on Mandarin Chinese processing. The model combines the mainstream constitute and dependency parsing and the dataset we use it the Tsinghua Chinese Treebank, whose annotation has both constitutes and head information. We show the adaption of this annotation scheme to the normal constitute structure, dependency structure, and the integration of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Learning Head-modifier Pairs to Improve Lexicalized Dependency Parsing on a Chinese Treebank

نویسندگان

چکیده

منابع مشابه

Cascaded Classification for High Quality Head-modifier Pair Selection

Chapter 1: Lexicalized PCFG: Parsing Czech

Bootstrapping Lexicalized Models in Memory-Based Dependency Parsing

Lexicalized Beam Thresholding Parsing with Prior and Boundary Estimates

Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank

عنوان ژورنال:

اشتراک گذاری